- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources3
- Resource Type
-
0002000001000000
- More
- Availability
-
30
- Author / Contributor
- Filter by Author / Creator
-
-
Acton, Scott T (1)
-
Carpenter, Z. (1)
-
DeLiema., D. (1)
-
Foster, Jonathan K (1)
-
Kendeou, P. (1)
-
Olney, Andrew M (1)
-
Shaffer, D.W. (1)
-
Singh, Samarth (1)
-
Van_Aswegen, Rachel (1)
-
Wang, Y. (1)
-
Watson, Ginger S (1)
-
Youngs, Peter (1)
-
#Tyler Phillips, Kenneth E. (0)
-
#Willis, Ciara (0)
-
& Abreu-Ramos, E. D. (0)
-
& Abramson, C. I. (0)
-
& Abreu-Ramos, E. D. (0)
-
& Adams, S.G. (0)
-
& Ahmed, K. (0)
-
& Ahmed, Khadija. (0)
-
- Filter by Editor
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Khosravi, H (Ed.)Despite a tremendous increase in the use of video for conducting research in classrooms as well as preparing and evaluating teachers, there remain notable challenges to using classroom videos at scale, including time and financial costs. Recent advances in artificial intelligence could make the process of analyzing, scoring, and cataloguing videos more efficient. These advances include natural language processing, automated speech recognition, and deep neural networks. To train artificial intelligence to accurately classify activities in classroom videos, humans must first annotate a set of videos in a consistent way. This paper describes our investigation of the degree of inter-annotator reliability regarding identification of and duration of activities among annotators with and without experience analyzing classroom videos. Validity of human annotations is crucial for research involving temporal analysis within classroom video research. The study reported here represents an important step towards applying methods developed in other fields to validate temporal analytics within learning analytics research for classifying time- and event-based activities in classroom videos.more » « less
-
Carpenter, Z.; Wang, Y.; DeLiema., D.; Kendeou, P.; Shaffer, D.W. (, LAK23 Conference Proceedings: Toward Trustworthy Learning Analytics: The Thirteenth International Conference on Learning Analytics & Knowledge)Hilliger, I.; Khosravi, H.; Rienties, B.; Dawson, S. (Ed.)
-
Olney, Andrew M (, CEUR workshop proceedings)Moore, S; Stamper, J; Cao, T; Liu, Z; Hu, X; Lu, Y; Liang, J; Khosravi, H; Denny, P; Singh, A (Ed.)Multiple choice questions are traditionally expensive to produce. Recent advances in large language models (LLMs) have led to fine-tuned LLMs that generate questions competitive with human-authored questions. However, the relative capabilities of ChatGPT-family models have not yet been established for this task. We present a carefully-controlled human evaluation of three conditions: a fine-tuned, augmented version of Macaw, instruction-tuned Bing Chat with zero-shot prompting, and humanauthored questions from a college science textbook. Our results indicate that on six of seven measures tested, both LLM’s performance was not significantly different from human performance. Analysis of LLM errors further suggests that Macaw and Bing Chat have different failure modes for this task: Macaw tends to repeat answer options whereas Bing Chat tends to not include the specified answer in the answer options. For Macaw, removing error items from analysis results in performance on par with humans for all metrics; for Bing Chat, removing error items improves performance but does not reach human-level performance.more » « less
An official website of the United States government

Full Text Available